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Abstract 

The characteristics of wireless communication channels may vary with time due to fading, environmental changes 
and movement of mobile wireless devices. Tracking and estimating channel gains of wireless channels is therefore 
a fundamentally important element of many wireless communication systems. In particular, the receivers in many 
wireless networks need to estimate the channel gains by means of a training sequence. This paper studies the scaling 
law (on the network size) of the overhead for channel gain monitoring in wireless networks. We first investigate the 
scenario in which a receiver needs to track the channel gains with respect to multiple transmitters. To be concrete, 
suppose that there are n transmitters, and that in the current round of channel-gain estimation, k < n channels 
suffer significant variations since the last round. We prove that "0(fclog((n + l)/fc)) time slots" is the minimum 
overhead needed to catch up with the k varied channels. Here a time slot equals one symbol duration. At the same 
time, we propose a novel channel-gain monitoring scheme named ADMOT to achieve the overhead lower-bound. 
ADMOT leverages recent advances in compressive sensing in signal processing and interference processing in 
wireless communication, to enable the receiver to estimate all n channels in a reliable and computationally efficient 
manner within 0(k\og((n + l)/k)) time slots. To our best knowledge, all previous channel-tracking schemes require 
Q(n) time slots regardless of k. Note that based on above results for single receiver scenario, the scaling law of 
general setting is achieved in which there are multiple transmitters, relay nodes and receivers. 

Index terms: Wireless Network, Scaling Law, Channel Gain Estimation, Compressive Sensing. 

I. Introduction 

The knowledge of channel gains is often needed in the design of high performance communication schemes HI, 
EL 0, 0, @. In practice, due to fading, transmit power instability, environmental changes and movement of 
mobile wireless devices, the channel gains vary with time. Tracking and estimating channel gains of wireless 
channels is therefore fundamentally important @, Q, flU, O, 03D, El, EH- 

An issue of interest is how to reduce the overhead of channel-gain estimation. On the one hand, if between two 
rounds of channel-gain estimation, the channels have varied significantly, then communication reliability will be 



2 

jeopardized 0, (51, 0. On the other hand, if the frequency of channel-gain estimation is high, the overhead will 
also be high Q, @, fT3l . Our approach is predicated on reducing the overhead in each round, while maintaining 
high accuracy. 

We first consider the case in which a receiver needs to estimate the channel gains from n transmitters d, 0. 
As a mental picture, the reader could imagine the receiver to be a base station, and the transmitters to be mobile 
devices. To achieve reliable bit-error-rate (BER), the frequency of estimation should be high enough [1]. Then it is 
likely that only a few of the n channels have suffered appreciable changes since the last estimation. We make use 
of the techniques of compressive sensing and interference signal processing to reduce the time needed to perform 
the estimation in each round. We answer the following question: 

Suppose that in the current round, there are at most k < n channels suffering from appreciable channel gain 
variations. Given a target reliability for channel-gain estimation, what is the minimum overhead needed? 

We answer this question by analysis and construction. We prove that the minimum number time slots needed for 
estimation is B(/clog((n + l)/k)), and we propose a scheme (named ADMOT) that uses C(fclog((n + l)/k)) time 
slots[] Note that in each time slot, every transmitter transmits one symbol. Thus, one time slot is also one symbol 
duration. 

Note that the general network scenario is also studied in which there are multiple transmitters, relay nodes and 
receivers. Again, the scaling law of estimating all network channels is achieved in a reliable, computational efficient 
and distributed manner. 

A. Illustrating Example and Background Ideas 

Consider a toy network consisting of three transmitting nodes {Si, S2, S3} and one receiving node R. The three 
channels (Si,R), (S2,R) and (Ss,R) need to be estimated. Without loss of generality, let all the initial channel 
gains of the three channels be 1, and suppose one of the channel gains changes to x in the current time. The goal 
of monitoring is to identify the updated channel and the value of x. A simplistic monitoring scheme is to schedule 
transmissions on different channels in different time slots, as shown in Figure [1] In time slots 1, 2, and 3, sender 
Si, i = 1,2,3, sends probe signal 1 to node R, respectively, so that R can estimate the channel gain of (Si,R). 
Thus, altogether three time slots are needed. 

However, using the algebraic approach to exploit the nature of wireless medium, two time slots are enough. As 
shown in Figure [2j in the first time slot Si and S2 and S3 all send 1 to node R. These three signals "collide" in 
the air, but the collided signals turn out to be useful for our estimation. Let y[l] denote the signal received by R in 
the first time slot. We have y[l] = 3 + (x — 1). In the second time slot Si, S2 and S3 send 1, 2 and 3, respectively. 

1 Note that C(fclog((n + l)/k)) — 0(klog(n/k)). In the paper we use C(fclog((n + l)/k)) to avoid the confusing case where k — n. 
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Fig. 1. The monitoring scheme based on scheduling. The "solid-line", "dashed-line" and "dotted-line" are for the transmission of time 
slots 1, 2 and 3, respectively. 

Thus, the received signal is y[2] = 6 + i(x — 1) if (Si,R) is the updated channel. At the end of the second time 
slot, R computes [y(l),y(2)] — [3,6] = (x — l)[l,i]. Since [1, 1] and [1,2] and [1,3] are mutually independent, R 
can uniquely decode i and x. 
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Fig. 2. A better monitoring scheme. The first and second sub-figure show the transmissions in time slots 1 and 2, respectively. 



We summarize the main ingredients that give rise to the above saving as follows: 

(a) Embracing Interference for Group Probing. In group probing, all the n channels are probed simultaneously 
in each time slot. This is essential to get rid of the 6(n) overhead in the traditional unit probing in which the 
channels are probed one by one. Note the the 0(n) overhead in unit probing is fundamental even if the number 
of channels suffering appreciable variations, k, is much smaller than n. This is because we do not know which 
channels have changed. 

(b) Algebraic Distinguishability. With respect to the illustrating example in Figure [2j for each i G {1,2,3}, the 
training data of S% (i.e., {1, i}) induces an "algebraic fingerprint" for channel (Si, R). Due to the linear independence 
of the fingerprints, the one corresponding to the varied channel is not erasable even under wireless interference. 
Note that it is not necessary to construct independent fingerprints by increasing the probing power. Later Section III 
shows that probing data with uniform magnitude but random signs suffice. 



B. Overview of Our Results 

For the scenario where a receiver wants to monitor the channel gains from n transmitters, we first prove that 
the lower-bound of overhead is 0(fclog((n + l)/k)) time slots, where k < n is the number of channels suffering 
appreciable channel gain variations since the last round estimation. Note that when k = n, the overhead lower-bound 
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is 0(n). 
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Fig. 3. Systematical implementation of ADMOT. 

We then comprehensively develop the ideas in the above toy example and propose a scheme called Algebraic 
Differential SNR-Monitoring (ADMOT), which allows the receiver in the network to reliably monitor the channels 
with minimum overhead. For a systematical view, Figure [3] shows ADMOT in action for successive round^] In 
this figure, the network state is the set of channel gains of all channels, and the initial state is one in which every 
channel has zero channel gain. For the current round of monitoring, ADMOT estimates the network state from the 
estimation of the previous round. In the following, we summarize the desirable features of ADMOT. 
1) ADMOT is optimal. For any k < n, ADMOT allows the receiver to reliably estimate all n channels within 
0(Hog((n + l)/k)) time slots, which matches the lower bound 0(fclog((n + l)/k)). To our best knowledge, 
all previous monitoring results require 0(ra) regardless of k. Thus, ADMOT significantly reduces the overhead 
(compared with previous schemes) for small k, and preserves optimal performance even when k is close to n. 



These arguments are also verified by simulation in Section VI 

2) Under ADMOT, the computational complexity of the receiver is dominated by convex optimization programming, 
which can be computed in an efficient manner lfl4ll . 

3) ADMOT is & feedback-free monitoring scheme, i.e., the receiver does not need to send feedback during the 
monitoring and no centralized controller is assumed. Thus, the probing data can be incorporated as packet-header 
for practical packet transmissions CD, 0], O. 

4) ADMOT supports different modulations J2[ at the physical layer. For example, BPSK could be used. That is, the 
sources can code the training data (of ADMOT) into binary symbols for BPSK modulation, such that both channel 
attenuation and phase term can be estimated. 

We note that the single receiver scenario is the fundamental setting for studying the scaling law of monitoring 
wireless network. In Section [Vj the above results are applied to achieve the scaling law of general communication 
networks in which there are multiple transmitters, relay nodes and receivers. 



The interval between two monitoring rounds depends on the statistics of the channel coherence time and channel stability requirement. 



5 



C. Related Work 

Previous works fall into the following two categories. 

(a) Channel monitoring in wireless networks. The works 00, 13, HOl, lITTi . lfT2ll designed probing data and 
estimation algorithm for estimating channel gains, and the works lfl31l . El proposed schemes to estimate channel 
interference. In the first set of works (which are related to our work), interference has not been shown to be an 
advantage (compared with nonoverlapping probing signals by different transmitters), and the overhead achieved 
is 0(n). Note that in the domain of wireless network coding communication, the work was the first to show 
the advantage of interference, and later the work [5] proposed an amplify-and-forward relaying strategy for easy 
implementation. 

(b) Compressive sensing for channel estimation. ADMOT proposed in this paper uses recent advances of compres- 
sive sensing developed for sparse signal recovering |[T6l . ifTTI . Compressive sensing was used to recover the sparse 
features of channels, say channel's delay-Doppler sparsity lfl"8l . lfl9l . channel's sparse multipath structure EOl . |[T9ll . 
sparse-user detection ETTl . E2ll . ||23l and channel's sparse response Il24l . When applying above schemes to estimate 
all the n channels from the transmitters, the overhead is at least 0(n). In contrast, ADMOT uses compressive 
sensing to handle all channels' differential information (embedded in the overlapped probing) simultaneously, and 
achieves optimal overhead 0(Hog((n + l)/k)). 

Note that some previous schemes mentioned above estimate the property of a wideband channel, in which the 
channel gain varies across the frequency within the channel bandwidth. In contrast, this paper investigates the scaling 
law (on the network size) of wireless network monitoring. For the sake of exposition, we focus on narrowband 
channels in which the channel gain is flat across the bandwidth of the channel. We believe that within the same 
scaling law complexity, ADMOT can easily be generalized to OFDM systems |2), in which information is carried 
across multiple narrowband channels. 

D. Organization of the paper 

The rest of this paper is organized as follows. Section III] formulates the problem. The scaling law theorem 



and the construction of ADMOT are presented in Section III In Section HVj ADMOT is implemented by BPSK 



modulation. In Section [V] monitoring in general communication networks is studied. Experimental results are shown 



in Section VI to support the theoretical analysis of ADMOT. 



II. Problem Setting and Preliminaries 

A. Notation Conventions and Preliminaries 

Let Z be the set of all integers and Z + be the set of all positive integers. Let K be the set of all real numbers {i.e., 
the real field). For any a and b in Z + , let M. axb be the set of all matrices with dimensions a x b and components 
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chosen from R. For any matrix M G R axb and i G {1,2,..., a} and j G {1,2,..., 6}, let M(i,j) be the (i,j')'th 
component of M. 

Let R a be the set of all vectors with length a and components chosen from R. For any vector V G M a and 
i G {1, 2, a}, let V(i) be the (i)'th component of V. Vectors in the paper are in the column form. 

For any vector V G R n , let \\V\\i = £"=i denote the ^-norm and ||F|| 2 = VELi PWP denote 

the ^2-norm. For any vector V G R n and non-negative integer fe < n, we define the "distance" between V and 
k-sparsity by: 

4(^ = 11^-^11!, (1) 

where F fc is F with all but the largest k components set to 0. Vector V is said to be k-sparse if and only if 
dk(V) = 0, that is there are at most k nonzero components in V. 

Let C be the set of all complex numbers. The matrices and vectors over C has the similar definitions. For any 
scalar, vector or matrix X over C, let Re(X) be the real part of X and Im(X) be the imaginary part of X. For 
vector H G C n , let \\V\\l = \\Re{H)\\ 2 2 + \\Im(H)\\ 2 2 . 

Let Af(/j,,a 2 ) denote the normal distribution over real field, where p is the mean and a 2 is the variance. Note 
that throughout the paper, the logarithm function log(.) is computed over base 2, i.e., log(.) = log 2 (.). 

B. Communication Model for Wireless Network 

Consider a network where S = {Si, S2, S n } is the set of transmitting nodes. We first consider the scenario 
where there is only one receiving node R. 

We assume all transmissions are slotted and synchronized. In each time slot, each and every transmitter transmits 
one symbol. Thus, one time slot is also one symbol duration and slotted synchronization is the same as symbol 
synchronization as well. Assume each Si G S transmits symbol Xi[s] G C in time slot s. Thus, for time slot s the 
received signal at R is 

n 

Y[ S ] = Y J H(i)X t [ S } + Z[s], (2) 

i=l 

where H(i) G C is the channel gain of (Si,R) and Z[s\ G C is the noise. Note that both Re(Z[s]) and Im{Z[s\) 
are identically and independently distributed (i.i.d.)~ A/"(0, 1) across all time slots. The state of R is defined to be 
a vector H G C™, whose i'th component is H(i). 

The transmit power and noise power are both normalized to be equal to 1 and the amplitude of the channel gain 
\H(i) \ is the square-root of the signal-to-noise ratio (SNR) of (Si, R). For instance, for channel (Si, R), assume G(i) 
is the "true" channel gain, P is the transmit power and a 2 is the noise power. Therefore, SNR(S , j, R) = P\G(i)\ 2 / a 2 . 
After normalizing the transmit power and noise power, the amplitude of channel gain is \H(i)\ = \G(i)\VP/a = 



As noted in Section I-B such single receiver scenario is the basic setting for understanding the scaling law of 
wireless network monitoring. In Section [Vj we consider communication networks with multiple transmitters, relay 
nodes and receivers. 



C. Variation in Wireless Network 

Wireless network conditions vary with time due to fading, transmit power instability, environmental changes 
and movement of mobile wireless devices. For the purpose of communication, network variation is mathematically 



equivalent to the variation of the state H (the definition of H can be found in Section II-B I. Let H G C n be the 
a priori knowledge of previous state held by the receiving node R, and H be the current state. The monitoring 
objective of R is to estimate H using H and the received probes. Note if R has no a priori knowledge of the 
previous network state, H is set to be the zero vector in C n . For e > and non- negative integer k < n, the 
difference H — H is, said to be (k, e)-sparse if and only if 

d k (Re{H - H)) < e and d k (Im(H - H)) < e, 

where function is defined in Equation ([!]). 

For the simplicity, when e is small, "(k, e)-sparse" is also said to be "approx-/c-sparse". Thus, for "approx-fc- 
sparse" variation H — H, there are at most k channels suffering from significant variations for the channel gains, 
while the variations of other channels are negligible. In the following section ADMOT is proposed to catching up 
with the the k major varied channels with minimum overheads. 

III. Achieving the Scaling law by ADMOT 

In this section, a novel wireless network monitoring protocol ADMOT is proposed to achieve the scaling law 
shown in Theorem [1] To reduce the overhead of wireless monitoring, ADMOT fully develops the motivations shown 
in Section p-A] Furthermore, ADMOT exploits recent advances in the field of compressive sensing ( fRfl , ifTTl ). such 



that its correctness and optimality can be theoretically proved. A systematical view of ADMOT for consecutive 
wireless network monitoring can be found in Figure [3] 

A. Training Data of ADMOT 

The training data of ADMOT is denoted by matrix <I> with dimensions Nxn. Here, n is the number of transmitters 
in the network and N is the upper-bound of time slots used by ADMOT. Each component $(s, i) is generated 
independently from { — 1,1} with equal probability, for all s and all i. The i-th column of matrix is assumed to 
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be known a priori to transmitter Si, for all i G {1,2, ... ,n}. The knowledge of $ can be broadcast by R in the 
network setting stageQ 

The training data of each Si G S (i.e., the i'th column of <I>) is in fact the "algebraic fingerprint" of channel 
(Si,R). As the toy example shown in Figure [2j these fingerprints are "highly independent" such that the varied 
channels would expose their fingerprints even under interference. In the next subsection, using a convex program, 
ADMOT can catch up with the exposed fingerprints in an efficient and reliable manner. 

B. Complete Construction of ADMOT 

Before the detailed construction of ADMOT, a convex-optimization problem is proposed which serves as a 
submodule for ADMOT. 

. ConvexOPTU, Y, a). The input of ConvexOVT{A,Y,a) is A G R mxn and Y G R m and a > 0. The output 
of ConvexOPT(A, Y, a) is the solution X* G R n to the following problem: 

min ||X||i subject to \\AX - Y\\ 2 < a. (3) 

Note that ConvexOPT(^4, Y, a) is a second-order cone programming and can be solved efficiently lfl4ll . 

Let m < N be the system parameter denoting the number of time slots used by the current round of ADMOT. 
We construct: 

. ADMOT (H,S,R,m). 

• variables Initialization: Vector H* G C n is the estimation of H, which is initialized to be zero vector. Vector 
Y G C m is initialized to be zero vector. Let <E> m be the matrix consisting of the 1, 2, ...,m'th rows of <£. 

• Step A: For s = 1,2, m, in the s'th time slotj^] 

- For any Si G S, Si sends Q(s,i). 

- Node R sets Y(s) (i.e., the s'th component of Y) to be the received sample in the time slot. Thus, 



^( s ) = Yli=i ^(s,i)H(i) + Z(s), where Z(s) is the noise in the time slot (see Section II-B for details). 
. Step B: Node R computes D G C m as D = Y - <S> m H. Thus, D = <Z> m (H - H) + Z . 

• Step C: Node R runs ConvexOPT($ m , Re(D), y/2m) and ConvexOPT($ m , Im(D), y/2m). Let the solutions 

be denoted by Re(A*) G W 1 and Jm(A*) G M n , respectively. 
. Step D: Node R estimates H by H* = H + A*. 

3 To avoid the overhead of broadcasting <J>, we can generate $ by practical pseudorandom generators (such as AES [25 1). To be concrete, 
the i'th column of $ could be the output of AES(i). Thus, each node in the network can compute $ using AES. Note that since ADMOT 
can be simulated within polynomial time, pseudo randomness suffices 1251 . 

4 Note that the probing scheme of ADMOT looks like CDMA (2). For the clarification we note the difference between ADMOT and 
CDMA as: 1) CDMA is for information data detection, while ADMOT is for channel estimation. 2) CDMA requires near-orthogonal code 
sequence for each transmitter. In ADMOT, since m could be much less than n, the training data sent by each Si G S can be far from 
orthogonal. However, the combination of ADMOT and CDMA is an interesting direction for future research. 



. Step E: End ADMOT(#, S, R ). 

Thus, ADMOT can be performed in a feedback-free manner, i.e., the receiver does not need to send feedback 
during the monitoring and no centralized controller is assumed. Under ADMOT, the computational complexity of 
the receiver node R is dominated by running the second-order cone program ConvexOPT, which can be solved in 
an efficient manner lTT4ll . 

C. Main theorem of the paper 

The main theorem of the paper is: 
Theorem 1: ADMOT is optimal. 

• Scaling law. For any k < n, when H — H is (fc, #vfc)-sparse, any monitoring scheme achieving estimation 
error \\H* — H\\ 2 < 0(6) requires at least 6(A;log((n + l)/k)) time slots. 

• Achievability. Let k > be the maximum integer satisfying CoA;log((n + l)/k) < m for a constant Co, and 
5 > be the minimum real number such that H — H is (k, <5\/&)-sparse. The estimation error of ADMOT 
satisfies ||H*-i2"|| 2 < V2C 1 8 + 2C 2 with a probability 1 - O ^ e -° 15m ) . Here, C , Ci and C 2 are constants 
defined in Appendix [A] 

The detailed proof for the theorem is in Appendix [B] We have the following remarks regarding the theorem. 
Remark 1: When H — H is approx-Zc-sparse {i.e., 5 is small), the theorem shows that ADMOT reliably estimates 
H within 0(k\og((n + l)/k)) time slots, which achieves the scaling law. 



Remark 2: Recall that we normalize both probe power and noise power for the clarity. As shown in Section II-B 
the "true" channel gain G(i) is in fact G(i) = H(i)a/y/P, where P is the transmit power and a 2 is the noise power. 
Thus, the "true" channel gains are estimated by G* = aH*/VP. When G - G is (k,5 G Vk) -sparse, H — H is 
(k, <5\/A-)-sparse with 5 = 6 G VP/cr. Thus, the estimation error of G is then ||G* -G\\ 2 < (V2Ci5 + 2C 2 )cr/y/P = 
a/2Ci(5g + 2C 2 o- I 'y/P. Thus, for large probing power P, the error term 2C 2 a j\[P caused by noise disappears and 
\\G* -G\\ 2 approximates V2Ci<5 G 

D. Adjusting system parameter m 

The system parameter m corresponds to the trade-off between overheads and estimation errors. Ideally, we should 

choose m = Coklog((n + l)/k), where k is the number of channel gains which suffer significant variations since 

the last round estimation. Thus, the receiver R should estimate the typical number of varied channels between two 

monitoring rounds, and then adjust m for future rounds of ADMOlQ 

5 Note that for the case where each node in S has different probing power, similar argument can be shown with somewhat unwieldy 
notations. 

6 Note that the receiver R can inform its choosing of m to other nodes by broadcasting before the next round of ADMOT. For instance, 
consider a cellular network where the receiver is the base station (TJ. The information of m can be delivered in the stage of downlink 
transmission. 
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This object can be achieved by analyzing the estimation error in the past rounds. To be concrete, consider a past 
round of ADMOT with system parameter m. Let Y be the received data in this round. Receiver R divides Y G C m 
into two parts (one for "estimation" and the other for "testing"): Vector Y\ G C m ~ d comprises of the first m — d 
components of Y and Y2 G C d comprises of the last d components of Y. Similarly, matrix 3>( m ,i) comprises of 
the first m — d rows of <3? m and $( m ,2) comprises of the last d rows of $ m . 

Receiver R runs Step B, C and D of ADMOT by using Y\ and <&( m ,i) (instead of Y and <P m , respectively). Let 
Ht be the estimation of H. Let D 2 = Y 2 - ®( mt 2)H£ and \\H — H^\\ 2 = tp. Then we have: 

Theorem 2: The event H-D2II2 > d{ipyjzp2 + 2) 2 happens with a probability at most 0\e~°- 15d \ If <p > 2\/2, 
the event H-D2II2 < d(<p/\/2 - 2) 2 happens with a probability at most o(e~ 01Bd \ 

The proof is delivered into Appendix [C| 

The theorem shows that the estimation error \ \H — H*\ \ preserves a close relationship with HZ^Ib- Thus, when 
H-D2II2 is large, R concludes that m — d time slots do not suffice to estimate H. Note that since d is relatively 
small (compared with m), m time slots probing is also not reliable for estimating H. Thus, R should increase m 
for future rounds of ADMOT. 

On the other hand, small H-D2II2 implies m — d time slots are sufficient for estimating H. For precisely estimating 
the minimum time slots which suffice, R can update d to 2d and then re-computes ||Z?2 lb- 
Then, R repeats this process until it finds the minimum integer p such that m — pd time slots are not sufficient. 
In the end, based on pd, R can choose an appropriate decreasing of m for future rounds of ADMOT. 

IV. BPSK Implementation of ADMOT 

In this section, we describe ADMOT implementation using BPSK modulation [2J. That is, we assume the symbols 
in <I> are BPSK symbols. 

Each node Si G S transmits a BPSK symbol <&(s, i) G { — 1,1} in time slot s. All transmitters transmit their 
symbols on the angular carrier frequency u. Let T be the duration of a time slot. Thus, in continuous time, Si 
transmits the signal Xi(t) = Re(2~2 s 3>(s, i)p(t — sT)e^ 1 ), where p(t) = 1 for < t < T; and p(t) = otherwise. 
Let the channel gain associated with Si be H(i) = Aje -J ' *, where Ai G [0, +00) is the amplitude of the channel 
gain and Oi is the phase delay due to signal propagation delay. 

The signal reaching R from Si is then Re(2~2 s $(s,i)p(t — sT)e jfaJ< Aie~i 9i ). Taking into the consideration the 
signals from all nodes in S and the circuit noise, the combined signal at R is 

y(t) =J2 Re (Y, »M* " sT)e^A t e-^) + z(t), 

i s 

where z(t) is the noise. 
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We assume T > By matched filtering (i.e., multiplying y(t) by cos(wi) and integrating over successive 
symbol periods T; and multiplying y(t) by sin(wt) and integrating over successive symbol periods), we can get 

2 f sT 

Y cos [s] = Ai cos(6»;)$(s, i) + - / z(t)cos(ut)dt, 

i 1 J(s-1)T 

rsT 



2 f s 

Ysin[s] = )Ai sin(6» i )$(s, i) + — / z(t) sm(ut)dt. 

; ± J(S-1)T 



(s-l)T 

Note that the power of noise are normalized to 1 such that |* cos(ut)dt and ^ $uLx\t z ^> sin(wi)eii 

are both i.i.d. ~ 7V(0, 1). 

Thus, there are two set of channels and no inter-set inference happens. The state of the first set of channels is 
H cos G R n , where the i'th component is H cos {i) = A^ cos(0j). And state of the second set of channels is H s i n G W 1 , 
where the i'th component is H s i n (i) = Aisvn(9i). 

Following ADMOT, the receiver can estimate H s i n and H cos simultaneously (as monitoring Re{H) and Im(H), 



see Section III-B). Once H s i n and H cos are estimated, {(Ai,9i) : 5, G S} can be computed efficiently. To be 



concrete, let H* in and H* os be the estimations of H s i n and H cos , respectively. Thus, 9{ can be estimated by 



9* = t & n-\H* sm (i)/H* cos {£)) and A % can be estimated by A* = ^(H* os {i)) 2 + {H* sm (i)f. 

Note that ADMOT can also be implemented with other modulations. Due to the limit of space, we only present 
the BPSK implementations for ADMOT in this paper. 

V. Achieving the Scaling Law for General Communication Network 

In this section, we study the general communication network with multiple transmitters, receivers and interme- 
diate relay nodes, e.g., the cooperative communication networks [26]. To be concrete, we can model the general 
communication network as (S,1Z,C), where S = {Si, S2, S n } is set of transmitters , 1Z = {R\, R2, R n '} is 
the set of receivers and C = {C\, C2, C n "} is the set of intermediate nodes for relaying. Let £ denote the set of 
all the channels: {(Si, Rj), (C a , Rj), (Si, C a ), (C a , C&) : S% € S, Rj G TZ, C a , Cb G C}. Each channel in £ is either 
used for communication or considered as a interfering channel, and therefore needs to be monitored. 




Fig. 4. The channels in an illustrating communication network (S,1Z,C). In the figure, 5 = {Si}, TZ = {Ri} and C = {CijC?}. The 
directed lines denote the channels which are used for communications and therefore require monitoring. 
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For each node f3 G 1ZUC, let Hg G c^+n b e t he channel gains of the channels from SUC to f3, and Ha G C n+n " 
be the a priori knowledge of these channel gains preserved by j3. Assuming for each node in TZUC, its state variation 
Hp — Hp is approx-£;-sparse for k < n + n" '. Using the scaling law in Theorem [TJ at least 0(fclog((n + n // + l)/A;)) 
time slots are needed. 

Under the full-duplex model, where any node in C can transmit and receive in the same time slot, ADMOT(i^, SU 
C,(3,m) can be performed simultaneously for each (3 G 1ZUC. Thus, we can achieve above overhead lower bound 
by choosing m = 0(fclog((n + n" + l)/k)). 

For the half-duplex model, any node in C cannot transmit and receive in a same time slot. Note that in the 
half-duplex model, for any node Cj G C the channel gain of (Cj, Cj) is always assumed to be 0. In next subsection 
we propose a non-straightforward generalization of ADMOT (named ADMOT-GENERAL) to achieve the scaling 
law e(Hog((n + n" + l)/k)). 

Note that for both models, our achievable schemes can be performed in a distributed manner, i.e., no centralized 
controller is needed. 

A. ADMOT-GENERAL 

To handle the half-duplex feature, for each time slot ADMOT-GENERAL randomly selects (with 1/2 probability) 
nodes in C to send probe data, while other nodes in C receive signal in the time slot. To be concrete, let <J> G 
^Nx(n+n") k £ t ram i n g d ata matrix. Each component of $ is i.i.d. chosen from {0,-1, 1} with a probability 
{1/2, 1/4, 1/4}. For each Si G S, the i'th column of <3? is the training data of Si. For each Cj G C, the n + j'th 
column of is the training data of Cj. In ADMOT-GENERAL, for the s'th time slot, if the training data of Cj is 
zero, Cj would receive signal in the time slot; Otherwise Cj would send the corresponding probe data. 

Choosing m = 3C' klog(n/k) = Q(klog(n/k)) where C is a constant defined in Appendix |a} we have: 

. ADMOT-GENERAL(mQ 

• variables Initialization: For each (3 G 1Z U C, vector H% G C n+n " is the estimation of H, which is initialized 
to be zero vector. For each (3 G TZ U C, let Zp = {i : <3?(z, n + j) = 0,i G {1, 2, m}} if (3 = Cj G C, and 
1/3 = {1,2, ...,m} if p ell. For each j3 G TZ U C, let mp = \1 B \, and $ B G R m f><( n + n ") consist of the rows 
of $ which are indexed by Xp, vector Yp G C ms be initialized to be zero vector. 

• Step A: For s = 1, 2, m, in the s'th time slot: 

- For any Si G S, Si sends 3>(s,i). 

- For any Cj G C, if $(s, n+j) = 0, Cj receives signal in this time slot; Otherwise Cj sends &p(sp,n+j), 
where sp G Xp is the index of the row in Qp which corresponds to the s'th row of <I>. 

7 For the simplicity, we omit other parameters {{Hp : /3 £ S U C}, S, C, TZ). 
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- Any node in 1Z receives signal in this time slot. 

- For each node f3 € TZUC, if j3 received signal in this time slot, f3 sets Yp(sp) to be the received sample. 
Thus, Ya{sg) = J27=i ^/3( s /3^)Hp(i) + Zp(s), where Zp(s) is the noise in the time slot. 

. Step B: For each f3 G K U C, /3 computes Dp G C m » as Dp = Yp — QpHp. Thus, Dp = $p{Hp - Hp) + Zp, 

where Zp G Co 1 is the noise vector for (3. 
. Step C: For each (3 G KUC, f3 runs ConvexOPT^, Re(Dp), ^/2mp) and ConvexOPT^, Im(Dp), y/2mp)^ 

Let the solutions be denoted by Re(A*p) G R n+n " and Jm(Ao) G IR n+n ", respectively. 
. Step D: For each /3 G K U C, /3 estimates /3 by HZ = Hp + Al 
. Step E: End ADMOT-GENERAL. 

For each (3 £ 1ZU C, using Chernoff Bound E71 . we have rrig > m/3 => C' klog(n/k) with a probability at 
least 1 — 2~ m . Using the achievability result in Theorem [TJ we conclude /3 can recover with bounded square 
root error^l 

VI. Performance evaluation 

We evaluate ADMOT by implementing it in a systematical manner, as shown in Figure [3] Let the n = \S\ = 500, 
and the average channel SNR= 20db. 

Recall that a channel is said to preserve stability x% if the probability is no more than (1 — x%) that the channel 
suffers significant variation during the interval between two monitoring rounds. 

In the simulation, let H[r] the be state of the r'th round. Thus, for the channel stability x%, H[r] = H[r— 1]+A[r], 
where A[r] is the variation. Each component of A[r] G C n , say A[r](i), is independently generated as: With a 
probability x%, both Re(A[r](i)) and Im(A[r](i)) are uniformly chosen from [—10, 10]; With a probability 1— x%, 
both Re{A[r]{i)) and Im(A[r](i)) are uniformly chosen from [—250,250]. 

We proceed ADMOT(H*[r — 1],S, R,m r ) for the r'th round estimating. Here, H*[r — 1] is the estimation of 
H[r — 1] in the (r — l)'th round, and the system parameter m r is chosen depending on the receiving data in the 



previous rounds (see Section III-D for details) 



Figure [5] shows the average time slots (per round) used by ADMOT for x G (0, 100). In the figure, the solid line 



is for ADMOT, and the dashed line is for previous monitoring schemes (see related works in Section I-C 1. From 
the figure, we can see that ADMOT significantly reduces the overheads for the scenarios where x is large, i.e., 
high channel stability is required. In the region where x is small, ADMOT also preserves reliable performance. 

8 Note that if /3 = Cj G C, since the channel gain of (Cj,Cj) is always assumed to be zero, the n + j'th components of Re(Dp) and 
Im(Dp) are both fixed to be zero for running ConvexOPT. 

9 Note that $^ satisfies RIP of order k with a probability at least 1 — 2™ (see Appendix [a|, which is the sufficient requirement for applying 
Theorem [T| (see the proof in Appendix |B|>. 
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Performance of ADMOT 

550 [ 1 1 [ 1 




50 - 



0% 20% 40% 60% 80% 90% 95% 99% 

Channel Stability x% 



Fig. 5. Comparison between ADMOT and previous monitoring schemes. 

We also provide the detailed simulations for the cases where channel preserves stabilities 80%, 90%, and 98%, 
respectively. 

Figure [6] shows the time slots used by ADMOT for round r G {1, 2, .., 50}. For the channel stability 80%, 90%, 
and 98%, the average time slots used per round are 320, 252, and 140, respectively. Note that since we assume 
zero knowledge for the initial network state, the first round of each case costs almost 500 time slots. 



500 r 




Fig. 6. Estimating time for each stage of running ADMOT. 
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Figure [7] shows the relative estimation errors — i?[r]||2/||-H'[r]||2) of ADMOT for round r € {1, 2, .., 50}. 

Note that we bound estimation error regardless the channel stability x%. Thus, lower channel stability only 
corresponds to more overheads (as shown in Figure [6]). 



0.03 r 




Ql 1 1 [ I I I I [ [ I 

5 10 15 20 25 30 35 40 45 50 

Round 



Fig. 7. Relative estimating error for each stage of running ADMOT. 

For a detailed looking, we also show the estimations at round 50 for the case of 80% channel stability. Figure [8] 
draws (the absolute value of the real part) channel gains and the corresponding estimations for the 200, 201, 300'th 
channels. 

VII. Conclusion 

In the paper, we first investigate the scenario where a receiver needs to track the channel gains of the channels with 
respect to n transmitters. We assume that in current round of channel gain estimation, no more than k < n channels 
suffer significant variations since the last round. We prove that "0(/clog((n + l)/k)) time slots" is the minimum 
number of time slots needed to catch up with k varying channels. At the same time, we propose ADMOT to achieve 
the lower bound in a computationally efficient manner. Furthermore, ADMOT supports different modulations at the 
physical layer. Using above results, we also achieve the scaling law of general communication networks in which 
there are multiple transmitters, relay nodes and receivers. In the end of the paper, we also present simulation results 
to support our theoretical analysis. 
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Appendix A 
Preliminaries of Compressive sensing 

Compressive sensing is a mathematical technique developed for compressible data recovering with significantly 
fewer samples than the length of data lTT6l . ifTTll . All compressive sensing results used in the paper is introduced 
in this section. 

Let M be a matrix in R mxri with m <C n. Assume each column of M is normalized to have ^-norm 1. For 
positive integer k, M is said to satisfy restricted isometry property(RlP) of order k if (1 — 6k)\ \X\ || < 1 1 Af_X"| || < 
(1 + <5 fc )||X||| for all /c-sparse vector X G W l EH. 
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Let A G ]^ mxn be a matrix. Then we have E9l : 

• Each element of A is i.i.d. generated from {—1, 1} with equal probability {1/2, 1/2}. Then, with overwhelming 
probability (i.e., 1 — 0{2~ n )), matrix Aj^/m satisfies RIP of order-A; provided that m > Coklog ((n + l)/k), 
where Co is a constant depending on each instance. 

• Each element of A is i.i.d. generated from {0, —1, 1} with probability {1/2, 1/4, 1/4}. Then, with overwhelm- 



ing probability (i.e., 1— 0(2 n )), matrix \J2jm A satisfies RIP of order- k provided that m > C' Q klog((n + l)/k), 
where C' is a constant depending on each instance. 
Let X G W 1 be the data vector and Y = AX + Z be the noisy measurement, where Z G R m is the noise with 



\Z\\ 2 < a. Let X* G M n be the solution to ConvexOPT(yl, Y, a), where ConvexOPT(.) is defined in Section |III-B 
Assuming Aj^fm satisfies RIP, the following theorem is proved in [28]. 
Theorem 3: The solution X* obeys 



\\X-X*\\ 2 <C 1 d k (X)/Vk + C 2 a/Vm, (4) 

where C\ and C 2 are constants and is defined in ([!]). 

Appendix B 
Proof for TheoremQ] 

We show an intermediate lemma before the proof of Theorem [TJ Let Z G M. m such that each of its component 
Z(i) is i.i.d. ~ AA(0, 1). Then we have 

Lemma 4: The ^2-norm \\Z\\ 2 < \j2m with a probability at least 1 — e -° 15m . 
Proof: For any i ^ j, Z(i) and Z(j) are independent and normally distributed. Then the probability density function 
of X = Z(i) 2 + Z(j) 2 is f x (x) = e~ x l 2 /2 for x > El. Thus, ^(e^/ 4 ) = /+ 00 e (~ x / 4 )/2(ix = 2. 

Without loss of generality we assume ADMOT chooses m as an even integer. Then we have: 

d 

Pr{\\Z\\l > 2m) = Pr(^J2z(i) 2 /A > m/2^ 

i=l 



< £;fe E -i zw2/4N ) /e m/2 Markov Inequality 



m/2 

= Y\E(e z{ - 2 ^l A+z ^ 2 / A )/e m ' 2 Independence 

^ 2 m /2 ^/g m /2 <; g— 0.15m 



□ 



Then we have: 
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Proof of the Scaling laws in Theorem [lj Without loss of generality, we first consider a sub-problem (i.e., an 
easier problem): Assuming Re(H — H) is (k, \/A;C) -variation and Im(H — H) is a all-zero vector, what is the 
minimum time slots required to find Re(H*) such that \\Re(H*) - Re{H)\\l = 0(1)? 

Assume T time slots are used for estimating Re(H). For any s = 1, 2, T, and i = 1,2, n, in the t'th time 
slot let Si send A(s,i) G C. Here A is a T x n complex matrix whose (s, «)'th component is A(s, i). 

Let Y(s) G C be the received data of i? in the s'th time slot, and therefore Y G C T be a length-T vector whose 
s'th component is Y(s). Thus Y = AH. Note that we assume that there is no noise here, which only reduces the 
complexity of estimating H. 

Since H and A are known by R as a priori, the original problem is equivalent to estimating A = H — H by 
D = Y — AH = A(H - H). Due to Im(H) = Im(H), we have Y = A(Re(H) - Re(H)). Thus the problem is 
equivalent to estimating Re(A) by using Re(D) and Im(D), which compose of 2T linear samples (over M) of 
Re(A). 

A recent result ll30l in the field of compressive sensing proves that provided dk(Re(A)) < Cyk for some 
constant C, it requires at least 8(&log((n + l)/k)) linear samples (over E) for reliably finding A* G M n such that 
||.Re(A) - A*||| < 0(1). Thus we have T > @{k log((n + l)/fc)). 

Thus, we prove the complexity of the easier problem. For the original problem, which considers random noise 
and the variations of imaginary parts of the channel gains, the complexity can only be higher. □ 
Proof of the Achievability in Theorem [lj Recall that the constant Co, C\, and C2 are defined in Appendix [A] 
Note that since m satisfies m > Co/clog((n + l)/k), with overwhelming probability (i.e., 1 — 0(2 -n )), the matrix 
<3? m satisfies RIP of order-A; (Appendix [A]). We henceforth assume it is true. 

We first analyze Re(D) = Q m Re(H — H) + Z, where Z G W n is the noise term. Using Lemma Q we have 
-fMH^Ih < \/2m) < e ~ 015m . We henceforth assume it is true. 

Thus, using Theorem[3) vector Re{A*) satisfies ||-Re(A*) + Re(H - H)\\ 2 < dd h (Re{H - H))/Vk + \/2C 2 . 

Since the state variation H - H is (k, <5v / A?)-sparse, after setting H* = H + A* we have \\Re(H* — H)\\ 2 < 
dS + V2C 2 . 

Similarly we have \\Im(H* - H)\\ 2 < C\S + \f2C 2 . In the end, we have \\H* - H\\ < \f2C x 8 + 2C 2 . It 
completes the proof. □ 

Appendix C 
Proof of Theorem[2] 

We first show an intermediate lemma. Let V G W 1 be a fixed vector, and R G W 1 be a vector of random variables. 
For each component R(i) we have Pr(R(i) = 1) = Pr(R(i) = —1) = 0.5, and R(i) is i.i.d. for all 1 < i < n. 
Let <V,R >= Y17=i ^W-^W denote the inner product between V and R. 



20 



Lemma 5: For < V,R > 2 , the expectation e( <V,R> 2 ^j is and the variance Var(< V,R > 2 ) is no 



more than H^Hf- 
Proof: We have 



<v,r> 2 = ^^)V« 2 + ^fl(i)fl(j)F(i)y(i) 

= v(i) 2 + Y,mR(j)v(i)v(j) 

+ J2R(i)R(j)V(i)V(j). 



Since E(Y Ji ^ j R(i)R(j)V(i)V(j)) = 0, we have E{ < V, R 
We have: 



E(<V,R> 4 ) = \\V\\i + E(JfeR®RO)VWV) 



2 



< \\v\\i+(J2v(i) 2 )(J2v(j) 



Thus Varf < V, R > 2 ) = e( < V, R > 4 ) - e( < V, R > 2 ) 2 is no more than \\V\\%. □ 
Then we have: 

Proof of Theorem Let A = (H - HI), <p R = ||i?e(A)|| 2 , tpj = ||im(A)|| 2 and thus ip = ||A|| 2 = yfy? R + <pj. 
Without loss of generality, we first analyze ||i?e(i?2)||2- 

From the definition, Re{D2) = Ur + Zr, where each component of Zr G M. d is i.i.d.~ W(0, 1) and Ur G M. d 
is $( mj2 )A_R. Using Lemma|5j for i = 1, 2, ...d, we have E(Ur(i) 2 ) = ip R and Var(C/i?(z) 2 ) < ^Jj. Note that each 
component of is i.i.d.; the noise term is i.i.d. ~ J\f(0, 1); Ur{i) and Ur{j) are independent for all % ^ j. Thus, we 
can apply Chernoff Bound (on discrete bounded random variables) [31] and get Pr(j J2i=i(UR(i) 2 — Vr) /{ nt PR)\ > 
d(p 2 R /{2nip>R) \ < 2e~ d2/16 . It is equivalent to 

Pr(dip 2 R /2 < \\U R \\ 2 < 3d<p R /2) > 1 - 2e- d2 / 16 



Using Lemma [ij we have Pt^ZrW 2 , > 2d) < e 



0.15cZ 
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Similarly, assuming Im(D2) = Ui + Zj, we have 

Pr(dip 2 j/2 < \\Uj\ll < 3d^/2) > 1 - 2e^ 2 / 16 , 
Pr(||Zj||l > 2d) < e-°- 15d . 

Using Union Bound E71 for Ur and Ui, we have 

Pr(d<p 2 /2 <\\U R + jLT/Hl < 3^/2) > 1 - 4e" d2/16 . 
Similarly, by the Union Bound for Zr and Zi we can derive 

Pr(||Z /? +jZ / |||>4 C i)<2 e - - 1M 

Note that D 2 = {Ur + jUi) + (Zr + jZj). Using triangle inequality, the event H-D2II2 > d(y^J%/2 + 2 ) 2 ) 
happens with a probability at most 4e" d2/16 + 2e" 01M = o(e~ 015d ). 

Assuming if 2 > 8, the event H-D2II2 < d((p/y/2 — 2) 2 happens with a probability at most 4e~ d2 / 16 + 2e _01M = 
0(e-°- 1M ). □ 



